As in [publication] we will perform a cluster QC to remove clusters of poorer quality. This will be assessed by the number of UMI counts, the mitochondrial percentage and the number of mice that contribute to each cluster. To do so we use a small cluster resolution, 5
Lower values of umi counts and detected genes can be associated to lower quality cells. Cells can also have lower expressed genes due to their biological state or celltype.
Select clusters with 50 % cells having less than 3000umi counts.
The clusters flagged are 4, 13, 15, 17, 21
High mithocondrial genes is associated with stressed, lower quality, cells.
Select clusters with 50 % cells having more than 10% mithocondrial genes.
The clusters flagged are 8, 9, 13.
How many mice contribute to each cluster?
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## KO 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 0
## WT 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Except from the obvious microglia clusters, where the numbers are very low or even absent in the fire mice nothing stands out.
We want to make sure that the differences in the quality are not due to the fact the mice are KO before deleting the clusters
## Warning: Removed 1 rows containing missing values (geom_text).
visualise in a plot
We consider that if the difference between the two groups is greater than 60 the cluster might be different due to a biological condition and it is better to not remove them for now even if they have high mithocondrial genes or low umi counts.
We then filter out the clusters highlighted as low umi, or high mt but do not have a big difference between control and treatment.
Click to expand
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] dplyr_1.0.5 scater_1.18.6
## [3] ggplot2_3.3.3 here_1.0.1
## [5] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
## [7] Biobase_2.50.0 GenomicRanges_1.42.0
## [9] GenomeInfoDb_1.26.2 IRanges_2.24.1
## [11] S4Vectors_0.28.1 BiocGenerics_0.36.0
## [13] MatrixGenerics_1.2.1 matrixStats_0.58.0
##
## loaded via a namespace (and not attached):
## [1] rsvd_1.0.3 Rcpp_1.0.6
## [3] lattice_0.20-41 assertthat_0.2.1
## [5] rprojroot_2.0.2 digest_0.6.27
## [7] utf8_1.1.4 R6_2.5.0
## [9] evaluate_0.14 highr_0.8
## [11] pillar_1.5.1 sparseMatrixStats_1.2.1
## [13] zlibbioc_1.36.0 rlang_0.4.10
## [15] irlba_2.3.3 jquerylib_0.1.3
## [17] Matrix_1.3-3 rmarkdown_2.7
## [19] labeling_0.4.2 BiocNeighbors_1.8.2
## [21] BiocParallel_1.24.1 stringr_1.4.0
## [23] RCurl_1.98-1.2 munsell_0.5.0
## [25] beachmat_2.6.4 DelayedArray_0.16.2
## [27] vipor_0.4.5 BiocSingular_1.6.0
## [29] compiler_4.0.4 xfun_0.21
## [31] pkgconfig_2.0.3 ggbeeswarm_0.6.0
## [33] htmltools_0.5.1.1 tidyselect_1.1.0
## [35] gridExtra_2.3 tibble_3.1.0
## [37] GenomeInfoDbData_1.2.4 viridisLite_0.3.0
## [39] fansi_0.4.2 crayon_1.4.1
## [41] withr_2.4.1 bitops_1.0-6
## [43] grid_4.0.4 jsonlite_1.7.2
## [45] gtable_0.3.0 lifecycle_1.0.0
## [47] DBI_1.1.1 magrittr_2.0.1
## [49] scales_1.1.1 stringi_1.5.3
## [51] scuttle_1.0.4 farver_2.1.0
## [53] XVector_0.30.0 viridis_0.5.1
## [55] bslib_0.2.4 DelayedMatrixStats_1.12.3
## [57] ellipsis_0.3.1 generics_0.1.0
## [59] vctrs_0.3.6 cowplot_1.1.1
## [61] tools_4.0.4 beeswarm_0.3.1
## [63] glue_1.4.2 purrr_0.3.4
## [65] yaml_2.2.1 colorspace_2.0-0
## [67] knitr_1.31 sass_0.3.1